This repository is from a study on legacy sediments conducted in eastern Maryland, southeastern PA, and northern Delaware. From the 17th to the 19th century, settlers constructed mill dams which accumulated deposits of fine sediments that entered the watershed as a result of deforestation and agriculture practices. When these dams are removed, sediment is released into streams by erosion processes and can affect the water quality in a number of ways.
Legacy sediment deposits contain both fine and coarse particles that can increase water turbidity. They may contain heavy metal deposits from local industries or increased nutrient concentrations from agriculture. They contain diverse microbial communities, some of which play an active role in nutrient cycling. The goal of the study was to examine the microbial community and determine nutrient and heavy metal content of legacy sediments to gain insight into the effects on water quality as the sediment deposits erode into streams.
Libraries & Themes
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(leaflet)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(viridis)
## Loading required package: viridisLite
library(fuzzyjoin)
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-7
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(grid)
theme_set(theme_minimal())
To read in the data:
coordinates <- read_csv("Data/Site_Coordinates.csv")
## Rows: 15 Columns: 5
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (3): Site_Name_, Site_Abbreviation, Land_Use
## dbl (2): Lat, Long
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
sediment <- read_csv("Data/Sediment_Characteristics.csv")
## Rows: 67 Columns: 40
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Sample_Name, Classification
## dbl (38): M3-Fe_(mg/kg), M3-Al_(mg/kg), M3-P_(mg/kg), M3-K_(mg/kg), M3-Ca_(m...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
microbes <- read_csv("Data/Microbial_Community.csv")
## Rows: 88987 Columns: 75
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): Sequ, Kingdom, Phylum, Class, Order, Family, Genus
## dbl (68): BEB_102, BEB_24, BEB_72, BYR_32, BYR_64, BYR_16, BYR16-rep, BYR_48...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
I will begin with a quick look at the structure of the data.
glimpse(coordinates)
## Rows: 15
## Columns: 5
## $ Site_Name_ <chr> "Brynes_Mill", "Middle_Run", "Nature_Center_Beach", ~
## $ Site_Abbreviation <chr> "BYR", "MR", "NCB", "WM", "CB", "TM", "BZ", "CM", "C~
## $ Land_Use <chr> "Urban", "Forested", "Agriculture", "Urban", "Agricu~
## $ Lat <dbl> 39.6980, 39.7214, 39.7273, 39.6893, 39.7474, 39.7261~
## $ Long <dbl> -75.6652, -75.7304, -75.7663, -75.7493, -75.8828, -7~
glimpse(sediment)
## Rows: 67
## Columns: 40
## $ Sample_Name <chr> "BEB_24", "BEB_48", "B~
## $ `M3-Fe_(mg/kg)` <dbl> 112.6563, 143.4813, 17~
## $ `M3-Al_(mg/kg)` <dbl> 530.9874, 619.6947, 97~
## $ `M3-P_(mg/kg)` <dbl> 9.385351, 5.816617, 5.~
## $ `M3-K_(mg/kg)` <dbl> 82.98439, 79.13156, 40~
## $ `M3-Ca_(mg/kg)` <dbl> 166.23137, 201.29854, ~
## $ `M3-Mg_(mg/kg)` <dbl> 46.94637, 56.73758, 50~
## $ `M3-Mn_(mg/kg)` <dbl> 32.01672, 46.33603, 55~
## $ `M3-Zn_(mg/kg)` <dbl> 1.6280164, 0.7070331, ~
## $ `M3-Cu_(mg/kg)` <dbl> 1.797530, 1.263417, 4.~
## $ `M3-B_(mg/kg)` <dbl> 0.11429087, 0.09000000~
## $ `M3-S_(mg/kg)` <dbl> 6.743988, 5.120056, 14~
## $ `M3-Si_(mg/kg)` <dbl> 167.5609, 205.7853, 17~
## $ `NH4-N_(mg/kg)` <dbl> 1.0893730, 1.0603415, ~
## $ `NO3-N_(mg/kg)` <dbl> 4.481423, 4.132374, 3.~
## $ `As_(mg/kg)` <dbl> 1.706392, 2.572356, 2.~
## $ `Cd_(mg/kg)` <dbl> 0.1000000, 0.1000000, ~
## $ `Co_(mg/kg)` <dbl> 11.254589, 13.733531, ~
## $ `Cr_(mg/kg)` <dbl> 22.38719, 27.24183, 32~
## $ `Cu_(mg/kg)` <dbl> 25.18492, 25.09540, 64~
## $ `Ni_(mg/kg)` <dbl> 18.02035, 21.94805, 26~
## $ `P_(mg/kg)` <dbl> 301.1392, 356.5430, 50~
## $ `Pb_(mg/kg)` <dbl> 16.662127, 9.636053, 1~
## $ `Si_(mg/kg)` <dbl> 1905.485, 1720.756, 18~
## $ `Zn_(mg/kg)` <dbl> 58.47805, 63.20351, 85~
## $ d13C <dbl> -26.58185, -25.00202, ~
## $ `%C` <dbl> 0.6524649, 0.6053798, ~
## $ d15N <dbl> 3.019114, 5.017909, 12~
## $ `%N` <dbl> 0.0270606, 0.0394667, ~
## $ `Bank_Depth_(in)` <dbl> 24, 48, 72, 102, 16, 3~
## $ `Total_Depth_(in)` <dbl> 102, 102, 102, 102, 96~
## $ `%_Depth` <dbl> 24, 47, 71, 100, 17, 3~
## $ Classification <chr> "Top_Layer", "Upper_Mi~
## $ `%Sand` <dbl> 73.47, 56.63, 26.12, 3~
## $ `%Silt` <dbl> 23.093, 36.949, 63.600~
## $ `%Clay` <dbl> 3.437, 6.421, 10.280, ~
## $ `%Coarse` <dbl> 0.73, 0.57, 0.26, 0.37~
## $ `%_Fine` <dbl> 0.2700, 0.4337, 0.7388~
## $ `Nitrifier_Copy#_Total_(copy#/g_soil)` <dbl> 116784, 80723, 146477,~
## $ `Dentrifier_Copy#_Total_(NosZ)_(copy#_/g_soil)` <dbl> 227264, 182946, 491062~
glimpse(microbes)
## Rows: 88,987
## Columns: 75
## $ Sequ <chr> "AATCAGTTATAGTTTATTGATGGTACCTTACTACATGGATAACCGTAGTAATTCTA~
## $ Kingdom <chr> "Bacteria", "Bacteria", "Bacteria", "Bacteria", NA, "Bact~
## $ Phylum <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ Class <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ Order <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ Family <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ Genus <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ BEB_102 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BEB_24 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BEB_72 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BYR_32 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BYR_64 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BYR_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ `BYR16-rep` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BYR_48 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BYR_84 <dbl> 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, ~
## $ BZ_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BZ_30 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ BZ_45 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, ~
## $ BZ_60 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CB_102 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CB_126 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CB_150 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CB_30 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ `CB30-rep` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CB_54 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CB_78 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CDM_24 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, ~
## $ CDM_48 <dbl> 0, 3, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CDM_72 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CDM_96 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CM_20 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CM_40 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CM_60 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ CM_80 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ COB_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, ~
## $ COB_22 <dbl> 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ COB_7 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ GMT_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ `GMT-12-rep` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ GMT_20 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ GMT_28 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ GMT_36 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ GMT_42 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ GMT_50 <dbl> 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ GMT_60 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ MR_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ MR_24 <dbl> 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ MR_36 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ MR_60 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ NCB_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ NCB_30 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ NCB_45 <dbl> 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, ~
## $ NCB_60 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ NCB_75 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ NCB_96 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ RH_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ RH_30 <dbl> 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ RH_45 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, ~
## $ RH_55 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ SM2_24 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ SM2_48 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ SM3_108 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ SM3_132 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ SM3_156 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ SM3_72 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TM_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, ~
## $ TM12rep <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TM_32 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TM_52 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TM_72 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ WM_12 <dbl> 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ WM_24 <dbl> 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ WM_36 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ WM_48 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 5, 0, ~
The data set is made up of three files. ‘Coordinates’ is the latitude, longitude, and land-use type of the sample sites. ‘Sediment’ includes the characteristics of the samples including chemistry, depth, particle size, and other parameters. ‘Microbes’ is a table of DNA sequences, the taxonomy of the sequences, and the count of each sequence in each sample.
I will do a clean up on the column names to make things easier later.
coordinates <- clean_names(coordinates)
sediment <- clean_names(sediment)
microbes <- clean_names(microbes)
I will start with a look at where the sample sites are located. Sites represent different land-use types: urban (red), suburban (blue), agriculture (yellow), or forested (green).
coordinates %>%
leaflet() %>%
setView(-75.6, 39.6, 9.5) %>%
addProviderTiles(providers$Stamen.Terrain) %>%
addCircleMarkers(lng = ~long,
lat = ~lat,
weight = .5,
stroke = F,
fill = T,
fillOpacity = .8,
fillColor = c("red",
"green",
"yellow",
"red",
"yellow",
"yellow",
"red",
"blue",
"red",
"blue",
"blue",
"green",
"yellow",
"yellow",
"yellow"))
Each site contains a deposit of legacy sediment. When the structures containing these sediments are removed, stream water begins to erode the sediment, releasing it into the watershed and leaving behind a tall, steep bank like this one at the Scott’s Mill site. Here you can also see the remnants of the mill dam.
knitr::include_graphics("ScottsMill.jpg")
At each sample site, cores were collected at various depths in the legacy sediment. Each sample represents one core at one depth. At least three cores were taken at each site.
summary(sediment)
## sample_name m3_fe_mg_kg m3_al_mg_kg m3_p_mg_kg
## Length:67 Min. : 89.53 Min. : 348.7 Min. : 0.4015
## Class :character 1st Qu.:157.51 1st Qu.: 677.5 1st Qu.: 3.8424
## Mode :character Median :244.74 Median : 834.2 Median : 6.5928
## Mean :253.74 Mean : 826.7 Mean :10.9028
## 3rd Qu.:325.07 3rd Qu.: 986.1 3rd Qu.:12.5098
## Max. :709.36 Max. :1258.9 Max. :51.4531
## m3_k_mg_kg m3_ca_mg_kg m3_mg_mg_kg m3_mn_mg_kg
## Min. : 11.42 Min. : 44.89 Min. : 20.98 Min. : 2.019
## 1st Qu.: 26.38 1st Qu.: 214.56 1st Qu.: 60.15 1st Qu.: 26.398
## Median : 36.61 Median : 326.06 Median : 81.48 Median : 45.697
## Mean : 44.43 Mean : 435.80 Mean : 128.08 Mean : 63.748
## 3rd Qu.: 51.67 3rd Qu.: 483.82 3rd Qu.: 144.50 3rd Qu.: 88.588
## Max. :189.54 Max. :1692.70 Max. :1335.41 Max. :310.515
## m3_zn_mg_kg m3_cu_mg_kg m3_b_mg_kg m3_s_mg_kg
## Min. : 0.5132 Min. : 0.6889 Min. :0.04067 Min. : 1.761
## 1st Qu.: 0.9486 1st Qu.: 1.6362 1st Qu.:0.09000 1st Qu.: 8.862
## Median : 1.4666 Median : 2.2956 Median :0.09000 Median :13.601
## Mean : 4.7125 Mean : 3.0903 Mean :0.14634 Mean :16.017
## 3rd Qu.: 3.2857 3rd Qu.: 3.6656 3rd Qu.:0.14186 3rd Qu.:18.905
## Max. :62.8682 Max. :13.1661 Max. :1.00320 Max. :59.471
## m3_si_mg_kg nh4_n_mg_kg no3_n_mg_kg as_mg_kg
## Min. : 80.56 Min. : 0.0650 Min. :0.7341 Min. :1.000
## 1st Qu.:233.45 1st Qu.: 0.2397 1st Qu.:2.8231 1st Qu.:2.126
## Median :281.80 Median : 1.0017 Median :3.1251 Median :2.922
## Mean :281.46 Mean : 3.8952 Mean :3.2845 Mean :3.314
## 3rd Qu.:322.57 3rd Qu.: 1.5703 3rd Qu.:3.6932 3rd Qu.:4.309
## Max. :456.50 Max. :175.1887 Max. :8.2501 Max. :7.160
## cd_mg_kg co_mg_kg cr_mg_kg cu_mg_kg
## Min. :0.0500 Min. : 1.189 Min. :12.39 Min. : 3.396
## 1st Qu.:0.1000 1st Qu.: 9.729 1st Qu.:27.73 1st Qu.:18.941
## Median :0.1000 Median :13.446 Median :35.51 Median :30.209
## Mean :0.1528 Mean :13.369 Mean :36.23 Mean :32.366
## 3rd Qu.:0.1000 3rd Qu.:17.584 3rd Qu.:43.26 3rd Qu.:41.792
## Max. :1.1908 Max. :38.650 Max. :69.61 Max. :88.175
## ni_mg_kg p_mg_kg pb_mg_kg si_mg_kg
## Min. : 2.764 Min. : 42.87 Min. : 2.718 Min. : 510.6
## 1st Qu.:17.071 1st Qu.: 260.82 1st Qu.: 8.719 1st Qu.:1259.6
## Median :23.593 Median : 372.41 Median : 13.155 Median :1624.6
## Mean :24.507 Mean : 411.83 Mean : 21.894 Mean :1539.1
## 3rd Qu.:29.666 3rd Qu.: 516.55 3rd Qu.: 24.093 3rd Qu.:1770.1
## Max. :81.149 Max. :1084.94 Max. :142.845 Max. :2492.9
## zn_mg_kg d13c percent_c d15n
## Min. : 12.08 Min. :-28.13 Min. :0.1002 Min. : 0.000
## 1st Qu.: 50.80 1st Qu.:-26.49 1st Qu.:0.5747 1st Qu.: 3.958
## Median : 67.44 Median :-25.94 Median :0.8925 Median : 7.645
## Mean : 98.14 Mean :-25.94 Mean :1.1642 Mean : 7.234
## 3rd Qu.: 91.94 3rd Qu.:-25.07 3rd Qu.:1.2754 3rd Qu.: 9.765
## Max. :1060.51 Max. :-24.34 Max. :4.5164 Max. :14.254
## percent_n bank_depth_in total_depth_in percent_depth
## Min. :0.00000 Min. : 12.00 Min. : 36.00 Min. : 15.00
## 1st Qu.:0.03036 1st Qu.: 26.00 1st Qu.: 66.00 1st Qu.: 33.00
## Median :0.06135 Median : 48.00 Median : 96.00 Median : 55.00
## Mean :0.07098 Mean : 52.63 Mean : 91.61 Mean : 57.34
## 3rd Qu.:0.09057 3rd Qu.: 72.00 3rd Qu.:108.00 3rd Qu.: 77.00
## Max. :0.32707 Max. :156.00 Max. :162.00 Max. :100.00
## classification percent_sand percent_silt percent_clay
## Length:67 Min. : 0.00 Min. : 8.432 Min. : 0.729
## Class :character 1st Qu.:26.29 1st Qu.:36.818 1st Qu.: 5.514
## Mode :character Median :39.56 Median :52.440 Median : 7.614
## Mean :39.95 Mean :51.734 Mean : 8.318
## 3rd Qu.:58.15 3rd Qu.:63.168 3rd Qu.:11.065
## Max. :90.84 Max. :84.350 Max. :19.760
## percent_coarse percent_fine
## Min. :0.0000 Min. :0.09161
## 1st Qu.:0.2600 1st Qu.:0.41855
## Median :0.4000 Median :0.60440
## Mean :0.4001 Mean :0.60052
## 3rd Qu.:0.5850 3rd Qu.:0.73715
## Max. :0.9100 Max. :1.00000
## nitrifier_copy_number_total_copy_number_g_soil
## Min. : 0
## 1st Qu.: 10875
## Median : 46183
## Mean : 528766
## 3rd Qu.: 142921
## Max. :11252139
## dentrifier_copy_number_total_nos_z_copy_number_g_soil
## Min. : 0
## 1st Qu.: 115814
## Median : 483692
## Mean : 1622050
## 3rd Qu.: 1586768
## Max. :21362220
Depths ranged from 12 inches at the shallowest to 156 inches at the deepest. These cores were analyzed for various chemical, physical, and genetic parameters.
The sediment core samples contain millions of microbes, each containing a DNA sequence. This genetic material is analyzed by high throughput sequencing which produces tens of thousands of DNA sequences. While this covers only a portion of the total microbial DNA, it is considered representative of the microbial community and allows for comparisons among samples. The ‘microbes’ table includes these sequences and the number present in each sample. It also includes the taxonomy of the sequences, identified by comparing sequences to a database of previously identified sequences provided by the Ribosomal Database Project.
Some sequences cannot be identified, and some can only be identified to a certain taxonomic level. For example, a sequence may be a close enough match to the database to identify its family but not its genus. Unidentified taxonomic levels are denoted as NA.
microbes %>%
summarize(across(everything(), ~sum(is.na(.))))
## # A tibble: 1 x 75
## sequ kingdom phylum class order family genus beb_102 beb_24 beb_72 byr_32
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 0 399 41271 49984 62449 74535 82561 0 0 0 0
## # ... with 64 more variables: byr_64 <int>, byr_16 <int>, byr16_rep <int>,
## # byr_48 <int>, byr_84 <int>, bz_15 <int>, bz_30 <int>, bz_45 <int>,
## # bz_60 <int>, cb_102 <int>, cb_126 <int>, cb_150 <int>, cb_30 <int>,
## # cb30_rep <int>, cb_54 <int>, cb_78 <int>, cdm_24 <int>, cdm_48 <int>,
## # cdm_72 <int>, cdm_96 <int>, cm_20 <int>, cm_40 <int>, cm_60 <int>,
## # cm_80 <int>, cob_15 <int>, cob_22 <int>, cob_7 <int>, gmt_12 <int>,
## # gmt_12_rep <int>, gmt_20 <int>, gmt_28 <int>, gmt_36 <int>, ...
Notice that the number of NAs increases with each taxonomic level. Also, the taxonomic identification stops at the genus level rather than attempting to identify the species of the sequences.
To organize the table, I will arrange it by the most abundant sequences. I also want to remove sequences that do not have an identified kingdom or phylum. I will also remove chloroplasts which are an artifact and should not be included in analyses. I will also remove rows with a zero total that are present because this table was originally part of larger project. In addition, some of the column names are R functions, so I will add a _m (for microbes) suffix
microbes_clean <-
microbes %>%
rename("class_m" = "class", "order_m" = "order", "family_m" = "family") %>%
mutate(total = rowSums(.[8:75])) %>%
filter(kingdom != is.na(kingdom),
phylum != is.na(phylum),
!(class_m %in% "Chloroplast"),
total > 0) %>%
arrange(desc(total))
One way of looking at a microbial community is species richness, the total number of species in each sample. This will give you an idea of the range of microbial diversity across sites. For this calculation, I will count each sequence in the microbes table as a different species. (In reality, the same species might be represented multiple times in the table, with slight differences between the sequences).
microbes_clean %>%
summarise(across(c(8:75), ~sum(as.numeric(.x != 0)))) %>%
pivot_longer(cols = everything(),
names_to = "sample_name",
values_to = "richness") %>%
ggplot(aes(x = richness,
y = reorder(sample_name, richness))) +
geom_bar(stat = "identity", fill = "#00C1AA", color = "black") +
labs(x = "Species Richness",
y = "Sample Name")
I will determine which are the most abundant phyla, and see how they compare across samples. I will collapse the sequences into phyla then graph the phyla table to observe the abundance.
# Collapse sequences into phyla
phylum_df <-
microbes_clean %>%
group_by(phylum) %>%
summarize(across(where(is.double), ~sum(.))) %>%
ungroup() %>%
arrange(desc(total))
# Graph the phyla
phylum_df %>%
select(phylum, total) %>%
ggplot(aes(x = total,
y = reorder(phylum, total))) +
geom_col() +
labs(x = "Total Sequences",
y = "Phylum")
I will take a look at how the proportions of the top ten most abundant phyla vary in different samples.
# Select phyla with >= 50000 sequences
phylum_top <-
phylum_df %>%
filter(total >= 50000)
# Reorganize table
phylum_top_long <-
phylum_top %>%
pivot_longer(!phylum,
names_to = "sample_name",
values_to = "count_of_seq"
) %>%
mutate(sample_name = str_replace_all(string = c(.$sample_name),
pattern = "sm2",
replacement = "smtwo")) %>%
separate(col = sample_name,
into = c("site_abbreviation"),
sep = "\\d",
remove = F) %>%
mutate(site_abbreviation = toupper(site_abbreviation)) %>%
mutate(site_abbreviation = str_replace_all(string = c(.$site_abbreviation),
pattern = "SM",
replacement = "SM3")) %>%
mutate(site_abbreviation = str_replace_all(string = c(.$site_abbreviation),
pattern = "SM3TWO_",
replacement = "SM2")) %>%
mutate(sample_name = str_replace_all(string = c(sample_name),
pattern = "smtwo",
replacement = "sm2")) %>%
mutate(site_abbreviation = str_remove_all(string = c(site_abbreviation),
pattern = "_"))
## Warning: Expected 1 pieces. Additional pieces discarded in 680 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# Graph phyla by sample
phylum_top_long %>%
ggplot(aes(fill = phylum,
y = count_of_seq,
x = sample_name)) +
geom_bar(position = "fill", stat = "identity") +
labs(y = "Proportion of Phylum",
x = "Sample Name") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_fill_viridis_d()
Acidobacteria and Proteobacteria are the dominant phyla across all samples. Acidobacteria are typically known to be prevalent and often dominant in soils. Proteobacteria is a diverse group that includes bacteria involved in nitrogen cycling, including nitrogen-fixing and ammonia-oxidizing. I will take a look at these and some other highly abundant phyla in comparison to sediment parameters later.
To begin examining sediment parameters, I need to join the tables.
sed_joined <-
sediment %>%
separate(col = sample_name,
into = "site_abbreviation",
sep = "_",
remove = F) %>%
full_join(
x = .,
y = coordinates,
by = c("site_abbreviation" = "site_abbreviation")
)
## Warning: Expected 1 pieces. Additional pieces discarded in 67 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
There are too many parameters to compare all combinations of them. Instead, I am most interested in how each parameter varies by land use and depth. I will look first at the parameters at each site, and for each land use type.
sed_sl <-
sed_joined %>%
select("site_name", "land_use")
sed_sl <-
colnames(sed_sl)
sed_continuous <-
sed_joined %>%
select(-"sample_name", -"site_abbreviation", -"classification", -"site_name", -"land_use", -"lat", -"long")
sed_continuous <-
colnames(sed_continuous)
sed_combos <-
expand_grid(field_1 = sed_sl,
field_2 = sed_continuous)
map2(.x = sed_combos$field_1,
.y = sed_combos$field_2,
~ sed_joined %>%
ggplot(aes(x = .data[[.x]],
y = .data[[.y]],
color = land_use)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
##
## [[20]]
##
## [[21]]
##
## [[22]]
##
## [[23]]
##
## [[24]]
##
## [[25]]
##
## [[26]]
##
## [[27]]
##
## [[28]]
##
## [[29]]
##
## [[30]]
##
## [[31]]
##
## [[32]]
##
## [[33]]
##
## [[34]]
##
## [[35]]
##
## [[36]]
##
## [[37]]
##
## [[38]]
##
## [[39]]
##
## [[40]]
##
## [[41]]
##
## [[42]]
##
## [[43]]
##
## [[44]]
##
## [[45]]
##
## [[46]]
##
## [[47]]
##
## [[48]]
##
## [[49]]
##
## [[50]]
##
## [[51]]
##
## [[52]]
##
## [[53]]
##
## [[54]]
##
## [[55]]
##
## [[56]]
##
## [[57]]
##
## [[58]]
##
## [[59]]
##
## [[60]]
##
## [[61]]
##
## [[62]]
##
## [[63]]
##
## [[64]]
##
## [[65]]
##
## [[66]]
##
## [[67]]
##
## [[68]]
##
## [[69]]
##
## [[70]]
##
## [[71]]
##
## [[72]]
##
## [[73]]
##
## [[74]]
##
## [[75]]
##
## [[76]]
A few interesting patterns can be observed. Urban land use showed higher levels of several heavy metals including iron (fe), zinc (zn), copper (cu), cadmium, (cd), chromium (cr), lead (pb), and nickel (ni). The site near the Brandywine Zoo, an urban site, showed noticeably high levels of phosphorus (p), calcium (ca), copper (cu), boron (b), silicon (si), cadmium (cd), lead (pb), and nickel (ni). Other high parameters included potassium (k) at Tweedes Mill and sulfur (s) at Woolen Mill. The chemical levels at urban sites may have been influenced by current or historical industry. The BZ site is downstream of the Dupont Experimental Station and former gunpowder mills, while Woolen Mill is located near a former paper mill.
I am also interested in how these parameters vary by depth.
sed_combos_d <-
expand_grid(field_1 = "bank_depth_in",
field_2 = sed_continuous)
map2(.x = sed_combos_d$field_1,
.y = sed_combos_d$field_2,
~ sed_joined %>%
ggplot(aes(x = .data[[.x]],
y = .data[[.y]])) +
geom_point() +
geom_smooth() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
)
## [[1]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[2]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[3]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[4]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[5]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[6]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[7]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[8]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[9]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[10]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[11]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[12]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[13]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[14]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[15]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[16]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[17]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[18]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[19]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[20]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[21]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[22]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[23]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[24]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[25]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[26]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[27]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[28]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[29]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[30]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[31]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[32]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[33]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[34]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[35]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[36]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[37]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
##
## [[38]]
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Depth appears to have very little impact on any of the measured parameters, with the possible exception of the number of denitrifying bacteria in the sediment. To take a closer look, I will split up the land use.
sed_joined %>%
ggplot(aes(x = bank_depth_in,
y = dentrifier_copy_number_total_nos_z_copy_number_g_soil,
color = land_use)) +
geom_point() +
geom_smooth() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Some of the agricultural sites have higher denitrifying bacteria in deeper soils, but it is not a strong trend.
To make comparisons among the microbial community and sediment parameters, I will join the tables and remove a few resulting NAs - either from sequencing samples that did not work but have sediment data, or from replicate sequencing samples that do not have sediment data.
phylum_top_long <-
phylum_top_long %>%
mutate(sample_name = toupper(sample_name))
phylum_sed <-
full_join(x = phylum_top_long,
y = sed_joined,
by = c("sample_name" = "sample_name")) %>%
filter(phylum != is.na(phylum),
site_abbreviation.y != is.na(site_abbreviation.y))
I will see if there are any interesting trends in the microbial community related to depth.
phylum_sed %>%
mutate(bank_depth_in_c = as.character(bank_depth_in)) %>%
mutate(bank_depth_in_c = fct_reorder(bank_depth_in_c, bank_depth_in)) %>%
ggplot(aes(fill = phylum,
y = count_of_seq,
x = bank_depth_in_c)) +
geom_bar(position = "fill", stat = "identity") +
labs(y = "Proportion of Phylum",
x = "Bank Depth(in)") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_fill_viridis_d()
As with the sediment parameters, there does not appear to be a trend related to depth.
Does the community differ in the various land uses?
phylum_sed %>%
ggplot(aes(fill = phylum,
y = count_of_seq,
x = land_use)) +
geom_bar(position = "fill", stat = "identity") +
labs(y = "Proportion of Phylum",
x = "Land Use") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_fill_viridis_d()
Land use also does not show a trend.
Bivariate analyses may be limited in their ability to show trends in the microbial community due to the size and diversity of the data set. A better way to make comparisons is to use non-metric multidimensional scaling, or NMDS. In this type of plot, samples are grouped based on their similarity. Samples with microbial communities that are highly similar will group together tightly, while samples with different communities will spread farther apart. First, I need to arrange the two data frames to match exactly, then run and plot the NMDS.
microbes_nmds <-
microbes_clean %>%
select(where(is.double), -"total") %>%
select(!contains("rep"))
sed_joined_nmds <-
sed_joined %>%
filter(!(sample_name %in% c("BEB_48", "SM2_72", "SM2_88"))) %>%
arrange(sample_name)
set.seed(1234)
NMDS <- metaMDS(t(microbes_nmds), distance = "bray", k = 3, maxit = 999, trymax = 100, wascores = TRUE)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1341781
## Run 1 stress 0.1329398
## ... New best solution
## ... Procrustes: rmse 0.06898779 max resid 0.277018
## Run 2 stress 0.1328998
## ... New best solution
## ... Procrustes: rmse 0.05341768 max resid 0.2661771
## Run 3 stress 0.1345004
## Run 4 stress 0.1395962
## Run 5 stress 0.1341777
## Run 6 stress 0.1334577
## Run 7 stress 0.1329381
## ... Procrustes: rmse 0.05320255 max resid 0.2658913
## Run 8 stress 0.1323873
## ... New best solution
## ... Procrustes: rmse 0.01373395 max resid 0.09102203
## Run 9 stress 0.1329403
## Run 10 stress 0.1329835
## Run 11 stress 0.134365
## Run 12 stress 0.1342696
## Run 13 stress 0.1346905
## Run 14 stress 0.1329659
## Run 15 stress 0.1451886
## Run 16 stress 0.1348137
## Run 17 stress 0.1323153
## ... New best solution
## ... Procrustes: rmse 0.006572997 max resid 0.04449178
## Run 18 stress 0.1329377
## Run 19 stress 0.1403864
## Run 20 stress 0.1340561
## Run 21 stress 0.1395979
## Run 22 stress 0.1329362
## Run 23 stress 0.1329376
## Run 24 stress 0.1346427
## Run 25 stress 0.1352463
## Run 26 stress 0.1382703
## Run 27 stress 0.1348743
## Run 28 stress 0.1351696
## Run 29 stress 0.1368977
## Run 30 stress 0.1342362
## Run 31 stress 0.1345656
## Run 32 stress 0.1452174
## Run 33 stress 0.1346929
## Run 34 stress 0.1329805
## Run 35 stress 0.1347024
## Run 36 stress 0.1323873
## ... Procrustes: rmse 0.00657485 max resid 0.0445159
## Run 37 stress 0.1403859
## Run 38 stress 0.1352528
## Run 39 stress 0.1346209
## Run 40 stress 0.1344438
## Run 41 stress 0.1368952
## Run 42 stress 0.1341332
## Run 43 stress 0.1368924
## Run 44 stress 0.1396964
## Run 45 stress 0.135565
## Run 46 stress 0.1382159
## Run 47 stress 0.1400401
## Run 48 stress 0.1323154
## ... Procrustes: rmse 1.963119e-05 max resid 9.246983e-05
## ... Similar to previous best
## *** Solution reached
## Warning in postMDS(out$points, dis, plot = max(0, plot - 1), ...): skipping
## half-change scaling: too few points below threshold
NMDS$stress
## [1] 0.1323153
nmds_scores <- as.data.frame(scores(NMDS))
nmds_scores$sample_name <- rownames(nmds_scores)
nmds_scores$Type <- c(sed_joined_nmds$land_use)
ggplot() +
geom_point(data=nmds_scores,aes(x=NMDS1,y=NMDS2, shape=Type, color=Type),size=3) +
geom_text(data=nmds_scores,aes(x=NMDS1,y=NMDS2,label=sample_name),size=2,vjust=0, hjust=-.1) +
coord_fixed() +
scale_shape_manual(values=c(15, 15, 15, 15)) +
scale_color_manual(values=c("yellow", "green", "blue", "red")) +
labs(caption = "Stress Value = 0.1323153")
Here I can see some more interesting trends than with the phylum graphs. Agriculture, Forested, and Suburban sites are generally similar to each other and group together on the graph. Urban sites are more different. The Brandywine Zoo and Cooches Bridge fall away from the other samples, although surprisingly, some of the Tweedes Mill sites appear to be similar to the Brandywine. The stress value is an indicator of the reliability of the NMDS plot. Values less than 0.2 are acceptable and the lower the value, the more reliable the plot.
The plot above shows two of the NMDS axes. NMDS plots can make use of three or more axes. I can use plotly to construct a 3D version of the NMDS and color it by land use.
plot_ly(x=nmds_scores[,1], y=nmds_scores[,2], z=nmds_scores[,3], type="scatter3d", mode="markers", color=sed_joined_nmds$land_use, colors=c("yellow", "green","blue", "red"))
To get a clearer picture of how the sediment characteristics influence the microbial community, I can fit them as vectors onto the NMDS. The longer the vector arrow, the more the pattern of the NMDS plot is driven by that parameter.
sed_vectors <-
sed_joined_nmds %>%
select(-sample_name, -site_abbreviation, -site_name, -lat, -long) %>%
select(-c(total_depth_in:percent_fine))
vectors_nmds <- envfit(NMDS, sed_vectors)
plot(NMDS, type = "n")
points(x = nmds_scores$NMDS1, y = nmds_scores$NMDS2, pch=15, col=c("Agriculture" = "yellow", "Forested" = "green", "Suburban" = "blue", "Urban" = "red")[sed_joined_nmds$land_use])
plot(vectors_nmds, col="black", p.max=0.05, cex = 0.75)
Layering the vectors makes it more clear that the pattern of the Urban samples is influenced by the concentrations of metals as well as nitrogen and nitrifying bacteria. Carbon and aluminum are also driving other aspects of the pattern for other land use types.
These data suggest that legacy sediments could have a variety of impacts on stream ecosystems. Microbial communities vary by location and land use. Nitrifying and denitrifying microbes are abundant in legacy sediments. These microbes play an important role in nutrient cycling, and could affect nitrogen levels in streams as they move between sediment deposits and water. Metals and nutrients are observed in sediments and in addition to their effect on microbial community composition, their presence is a concern for water quality. Further exploration into the interaction between microbes and nutrients in legacy sediments could be beneficial to future dam removal plans.